Developing a POS tagger for Magahi: A Comparative Study

نویسندگان

Ritesh Kumar

Bornini Lahiri

Deepak Alok

چکیده

In this paper, we present a comparative study of the four state-of-the-art sequential taggers applied on Magahi data for part-of-speech (POS) annotation . Magahi is one of the smaller Indo-Aryan languages spoken in Eastern state of Bihar in India. It is an extremely resource-poor language and it is the first attempt to develop some kind of Natural Language Processing (NLP) resource for the language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

Fast Development of Basic NLP Tools: Towards a Lexicon and a POS Tagger for Kurmanji Kurdish

The development of basic NLP resources for minority languages is still a challenge to both formal and computational linguists. In this paper, we show how we were able to develop a medium-scale morphological lexicon for Kurmanji Kurdish in a few days time using only freely accessible resources. We also developed a preliminary POS tagger that shall be used as a pre-annotation tool for developing ...

متن کامل

Fast or Accurate? - A Comparative Evaluation of PoS Tagging Models

We perform a comparison of 22 PoS tagger models for English and German offered by 9 different implementations. By evaluating on a mix of corpora from different domains, we simulate a black-box usage where researchers select a tagger (because of popularity, ease of use, etc.) and apply it to all sorts of text. We find the expected trade-off between fast models with relatively low accuracy and sl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Developing a POS tagger for Magahi: A Comparative Study

نویسندگان

چکیده

منابع مشابه

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Fast Development of Basic NLP Tools: Towards a Lexicon and a POS Tagger for Kurmanji Kurdish

Fast or Accurate? - A Comparative Evaluation of PoS Tagging Models

عنوان ژورنال:

اشتراک گذاری